Add robust parallel trends testing with Wasserstein distance#3
Merged
Conversation
- Add check_parallel_trends_robust() using Wasserstein (Earth Mover's) distance for distributional comparison of pre-treatment outcome changes - Include permutation-based p-value for statistical inference - Add Kolmogorov-Smirnov test as complementary distributional test - Add equivalence_test_trends() using TOST procedure - Compute normalized Wasserstein and variance ratio diagnostics - Add 9 new tests for robust parallel trends functionality - Update README with usage examples for all three approaches The Wasserstein distance is more robust than simple slope comparisons because it captures differences in the full distribution shape, not just means, making it better suited for heterogeneous effects.
igerber
pushed a commit
that referenced
this pull request
Jan 4, 2026
igerber
added a commit
that referenced
this pull request
Apr 17, 2026
- P1 #1/#2: Add _validate_group_constant_strata_psu() helper and call it from fit() after the weight_type/replicate-weights checks. The dCDH IF expansion psi_i = U[g] * (w_i / W_g) treats each group as the effective sampling unit; when strata or PSU vary within group it silently spreads horizon-specific IF mass across observations in different PSUs, contaminating the stratified-PSU variance. Walk back the overstated claim at the old line 669 comment to match. Within- group-varying weights remain supported. - P1 #3: _survey_se_from_group_if now filters zero-weight rows before np.unique/np.bincount so NaN / non-comparable group IDs on excluded subpopulation rows cannot crash SE factorization. psi stays full- length with zeros in excluded positions to preserve alignment with resolved.strata / resolved.psu inside compute_survey_if_variance. - REGISTRY.md line 652 Note updated: explicitly states the within-group-constant strata/PSU requirement and the within-group-varying weights support. - Tests: new TestSurveyWithinGroupValidation class (4 tests — rejects varying PSU, rejects varying strata, accepts varying weights, and ignores zero-weight rows during the constancy check) plus TestZeroWeightSubpopulation.test_zero_weight_row_with_nan_group_id. All 268 targeted tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 19, 2026
Addresses the second-round CI review findings: - P1 false-pass (remaining): removed five phase-local try/except blocks that swallowed sub-step exceptions (HonestDiD M-grids in brand-awareness and BRFSS, dCDH HonestDiD and heterogeneity refit, dose-response dataframe extraction). Exceptions now escape, the phase is marked ok=false, and run_scenario's atexit handler exits nonzero. The fix caught a real API-usage bug on its first rerun: dose_response extract phase tried to pull event_study level on a result fit with aggregate="dose"; the event_study fit lives in a dedicated phase, so that level is removed from the extraction loop. - P2 scenario-spec drift: BRFSS scenario text now says pweight TSL stage-2 (matching the aggregate_survey-returned design), not "Full replicate-weight path"; dCDH reversible scenario text now says heterogeneity="group" (matching the script), not "cohort". - P3 path leakage: tracemalloc output now scrubs $HOME, repo root, and site-packages before writing the committed txt. Drift-prevention layer: - gen_findings_tables.py reads every JSON baseline and rewrites the numerical tables in performance-plan.md between <!-- TABLE:start <id> --> / <!-- TABLE:end <id> --> markers. Tables now re-derive from data on every rerun, eliminating the hand-edit drift the prior review flagged. Narrative prose stays hand-written by design, forcing a human re-read of findings when numbers shift. Findings refresh (the numbers moved slightly; three narrative claims needed updating): - "Rust marginally slower than Python on JK1 at large scale" -> removed; fresh data has Rust and Python within noise on brand awareness at large (JK1 phase 0.577s Py / 0.562s Rust, totals 1.03 / 1.04). - "ImputationDiD consistently dominant phase at all scales" -> narrowed to "dominant under Python; tied with SunAbraham under Rust at large". - "Nine-figures of MB" in memory finding #3 was a phrasing error (literally 100+ TB); corrected to "mid-100s of MB". Priority of optimization opportunities refreshed against new data: - #1 aggregate_survey precompute stratum scaffolding: High (unchanged, now strongly supported - 24.75s Python / 25.41s Rust at 1M rows, 100% of chain runtime, growth only +31 MB). - #2 Staggered CS working-memory audit: Low with explicit bump-trigger (Rust large crosses 512 MB Lambda line). - #5 Rust-port JK1 replicate fit loop: demoted from Medium to Low - the "Rust regression to fix" leg of the rationale is gone because Rust is no longer slower. Net: one clear priority (aggregate_survey fix), four optional follow-ups. Still measurement only. No changes under diff_diff/ or rust/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 22, 2026
P1 #1 — Stute tie-safe CvM: Paper defines c_G(d) = Σ 1{D ≤ d} · eps with c_G(D_g) evaluated AT each observation's dose, so tied observations share the post-tie cumulative sum. My naive cumsum over sorted residuals produced partial within-tie sums that were row-order-dependent. Fix: after cumsum, replace within-tie-block values with the block's last cumsum via np.unique + np.repeat. `_cvm_statistic` now accepts `d_sorted` and collapses tie blocks before squaring. Regression test `test_cvm_statistic_tie_safe_order_invariance` pins order-invariance on duplicate doses at atol=1e-14; `test_stute_order_invariance_with_duplicate_doses` validates the end-to-end stute_test contract. P1 #2 — Exact-linear fit must fail-to-reject (not return NaN): For dy = a + b·d exact, Assumption 8 holds exactly and the correct outcome is p=1, reject=False. My previous var(eps)<=0 check routed this to NaN. Fix: dropped var(eps) degeneracy branch from stute_test (the bootstrap naturally produces p=1 when eps=0 exactly). Added a scale-relative short-circuit (sum(eps²) ≤ 1e-24 · sum(dy²)) in both stute_test and yatchew_hr_test so FP noise (eps ~ 1e-16 from IEEE arithmetic on dy = 1 + 2*d) doesn't defeat the short-circuit by producing non-zero but tiny OLS residuals. Yatchew exact-linear now returns (t_stat_hr=-inf, p=1, reject=False) rather than NaN. Regressions: TestStuteTest.test_exact_linear_returns_p1_not_nan, TestYatchewHRTest.test_exact_linear_returns_p1_not_nan. P1 #3 — HADPretestReport.all_pass contract: Previously `all_pass = not (reject or reject or reject)` could be True while `verdict` said "inconclusive - X NaN". Fix: gate all_pass on every constituent p-value being finite AND no test rejecting. Updated docstring. Regression: TestCompositeWorkflow.test_all_pass_false_when_any_test_nan. P2 #1 — QUG negative-dose guard: HAD doses must be non-negative (paper Section 2). The raw qug_test API was silently folding d < 0 rows into the n_excluded_zero counter (filter was `d > 0`). Fix: front-door ValueError on any d < 0. Regression: TestQUGTest.test_negative_dose_raises. P3 #1 — QUG np.partition: REGISTRY claims O(G) via np.partition. Code was using np.sort. Switched qug_test to np.partition(d_nz, 1), which guarantees partitioned[0] ≤ partitioned[1] = D_{(2)}, i.e., partitioned[0] = D_{(1)}. Tight closed-form parity at atol=1e-12 still holds. P3 #2 — REGISTRY n_bootstrap default: REGISTRY said "Default n_bootstrap = 499" but code ships 999. Updated REGISTRY to match code and added a note about the n_bootstrap >= 99 front-door validation. Test count: 47 -> 53. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 25, 2026
igerber
added a commit
that referenced
this pull request
Apr 25, 2026
dCDH by_path + placebo: per-path backward-horizon placebos (Wave 2 #3)
igerber
added a commit
that referenced
this pull request
Apr 26, 2026
R5 was ✅ Looks good — only P3 polish remained. All addressed: P3 #1 — exact-pin nprobust: The parity contract runs through nprobust numerical paths (DIDHAD's local-linear bandwidth + bias-correction calls), so a fresh regeneration could drift if CRAN serves a newer nprobust. Pin nprobust == 0.5.0 in both the R generator's stopifnot guard and the parity test's metadata assertion alongside DIDHAD and YatchewTest. P3 #2 — workflow docstring: did_had_pretest_workflow's top-level docstring still said "Eq 18 linear-trend detrending is a Phase 4 follow-up" which contradicts the shipped trends_lin behavior. Updated to describe the forwarding contract (trends_lin → joint_pretrends_test + joint_homogeneity_test, consumed-placebo skip path on minimal panels). Same fix on the StuteJointResult class docstring. P3 #3 — parity test horizon-shape assertions: Added an explicit "missing in Python" assertion in _zip_r_python: every R-mapped event time must be present in Python's event_times (catches future horizon-shape regressions where Python silently drops a horizon R requested). Added an effects+placebo row-count sanity check in test_yatchew_t_stat_parity (uses the previously- unused effects/placebo parametrize values to catch fixture drift). Stats: 540 tests pass, 0 regressions. No estimator/methodology changes — all P3 polish. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 26, 2026
R5 was ✅ Looks good — only P3 polish remained. All addressed: P3 #1 — exact-pin nprobust: The parity contract runs through nprobust numerical paths (DIDHAD's local-linear bandwidth + bias-correction calls), so a fresh regeneration could drift if CRAN serves a newer nprobust. Pin nprobust == 0.5.0 in both the R generator's stopifnot guard and the parity test's metadata assertion alongside DIDHAD and YatchewTest. P3 #2 — workflow docstring: did_had_pretest_workflow's top-level docstring still said "Eq 18 linear-trend detrending is a Phase 4 follow-up" which contradicts the shipped trends_lin behavior. Updated to describe the forwarding contract (trends_lin → joint_pretrends_test + joint_homogeneity_test, consumed-placebo skip path on minimal panels). Same fix on the StuteJointResult class docstring. P3 #3 — parity test horizon-shape assertions: Added an explicit "missing in Python" assertion in _zip_r_python: every R-mapped event time must be present in Python's event_times (catches future horizon-shape regressions where Python silently drops a horizon R requested). Added an effects+placebo row-count sanity check in test_yatchew_t_stat_parity (uses the previously- unused effects/placebo parametrize values to catch fixture drift). Stats: 540 tests pass, 0 regressions. No estimator/methodology changes — all P3 polish. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
May 14, 2026
R1 P1 #1: pin remaining dismissal invariants - Comment block claims 4 invariants hold but only invariants #1 (no execution) and #2 (fork-skip) had test coverage. Add 3 tests: - test_workflow_codex_step_uses_read_only_sandbox (invariant #1 other half: sandbox: read-only) - test_workflow_resolve_pr_sets_head_sha_from_api (invariant #4: head_sha API-pinned, not from event payload) - test_workflow_comment_triggers_require_author_association (invariant #3: comment triggers gated on OWNER/MEMBER/COLLABORATOR) R1 P1 #2: make guard test fail-closed across run scalar styles - Prior regex only matched `run: |` literal blocks; inline `run: pytest` and folded `run: >` bypassed the scan entirely. - Extract _extract_all_run_content static method that handles all three scalar styles (literal `|` with chomping variants, folded `>` with variants, and inline single-line). Both existing tests and a new python-file-exec test now use it. - Expand FORBIDDEN_EXECUTION_PATTERNS to include `pip3 install` and `npm ci` (reviewer-named omissions). - Add test_workflow_no_python_file_execution_against_workspace: regex flags `python(3)? <path>.py` invocations against workspace-relative paths (PR-head bytes), allowlists /tmp/-prefixed paths (BASE-staged via git show). Inline scripts (-c) and module invocations (-m) don't capture .py tokens, naturally excluded. Test-the-test verified inline + folded + literal + npm ci + python workspace all fire; python /tmp/ correctly does not. All 24 workflow tests pass.
igerber
added a commit
that referenced
this pull request
May 20, 2026
…prose L194 checklist update (this PR) said the Eq. 18 detrending variant shipped in PR #389; the explanatory prose immediately below at L200 still said "Phase 4 extends it with the Eq (18) detrending" as if it were future work. Rewritten to past tense matching the L194 closure and the REGISTRY § "Note (Phase 4 — Eq 17 / Eq 18 linear-trend detrending shipped)" framing. Only the Pierce-Schott numerical replication remains waived (REGISTRY Deviations Note #3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
HanomicsIMF
pushed a commit
to HanomicsIMF/diff-diff
that referenced
this pull request
May 22, 2026
Local Codex review on commit 79e0962 returned ✅ with 3 P3s (all documentation/coverage, no actionable P0/P1). Per the test-coverage P3 upgrade rule (feedback_test_coverage_gap_treat_as_actionable.md), addressing all three: P3 igerber#1 (Code Quality): `_compute_cr2_bm_contrast_dof` was missing the `ndim` validation that the parallel one-way `_compute_bm_dof_from_contrasts` helper has, so a stray `(k,)` 1-D vector would die with a low-level indexing error instead of a contract error. Added the same shape-tuple check pattern (`if contrasts.ndim != 2 or contrasts.shape[0] != k`). P3 igerber#2 (Docs): two stale doc surfaces post-feature-lift — - `estimators.py:68-71` base estimator docstring still said MPD did NOT support cluster + hc2_bm. Rewrote to describe the new cluster-aware contrast-DOF support and flag survey CR2-BM as the remaining gate. - `tests/test_linalg_hc2_bm.py` module banner still said clustered CR2 BM was "deferred to a follow-up". Updated to describe both the per-coefficient and the new compound-contrast DOF surfaces, and narrow the deferral to the weighted CR2-BM case only. P3 igerber#3 (Tests): the new MPD test only asserted finite output, so a regression that silently fell back to the shared n-k DOF would still pass. Added `test_multi_period_cluster_hc2_bm_avg_att_uses_clubsandwich_dof` which fits MPD on the new R `mpd_clustered_avg_att_dof` fixture and recovers the implied Satterthwaite DOF by inverting `avg_p_value = 2 * (1 - t.cdf(|avg_t_stat|, df))` via scipy.brentq. The recovered DOF must match the R `Wald_test(test="HTZ")$df_denom` at atol=1e-6. Also pins that the implied DOF is much smaller than the n-k fallback (~39 here) — catches a regression to the shared df path. All 254 tests in tests/test_linalg_hc2_bm.py + test_estimators_vcov_type.py + test_estimators.py pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HanomicsIMF
pushed a commit
to HanomicsIMF/diff-diff
that referenced
this pull request
May 22, 2026
Add tests/test_methodology_had.py (6 classes, 34 tests) with paper-
equation-numbered Verified Components walk-through against de
Chaisemartin, Ciccia, D'Haultfoeuille & Knau (2026) arXiv:2405.04465v6
covering Equations 3 / 7 / 11 / 18 / 29 and Theorems 1 / 3 / 4 / 7:
- TestHADTheorem1Design1Prime: Eq. 3 Design 1' WAS recovery + N(0,1)
coverage check at n_replicates=200, G=1000 with KS-stat <= 0.05 and
empirical 95% coverage >= 0.90
- TestHADTheorem3MassPoint: Eq. 11 / Theorem 3 mass-point WAS_{d_lower}
recovery + Wald-IV closed-form equivalence at atol=1e-9
- TestHADTheorem4QUG: Theorem 4 limit-law distributional match against
closed-form F(t) = t/(1+t) at KS-stat <= 0.05, n_draws=5000, G=2000
- TestHADTheorem7YatchewHR: Eq. 29 standard-normal limit, paper-literal
sigma2_diff = 1/(2G) normalization lock
- TestHADJointStute: Section 4.2 step 2 + 4.3 mean-independence variant
H0 fail-to-reject + H1 reject under nonlinear DGP
- TestHADDeviations: equal-weighting invariance, sup-t bootstrap gating,
staggered-timing fail-closed ValueError, safe_inference joint NaN
Add Assumption 5/6 non-testability documentation:
- HeterogeneousAdoptionDiD class docstring: new "Non-testable assumptions
(paper Section 3.1.2)" Notes block citing Section 3.1.2 + cross-
referencing the existing fit-time UserWarning at had.py:3372-3390
- qug_test / stute_test / yatchew_hr_test / did_had_pretest_workflow:
"Scope (what this test does NOT cover)" clauses in Notes sections
explicitly stating tests verify ADJACENT assumptions (4 / 7 / 8) and
CANNOT test Assumptions 5 or 6
Close paper-review checklist L182-L194 + REGISTRY HAD Implementation
Checklist L2602-L2604: Phase 1a/1b/1c implementation closures (panel
validator, design paths, local-linear backend, bias-corrected CI),
staggered-timing fail-closed ValueError, zero-dose UserWarning filter,
Assumption 5/6 non-testability documentation. L2604 (covariates=
Theorem 6 NotImplementedError) remains [ ] with explicit TODO.md
cross-reference (currently a Python TypeError, fail-closed).
Waive Phase-4 validation-harness items igerber#1 (Pierce-Schott 2016 Figure 2)
+ igerber#2 (Table 1 coverage rates) with documented rationale: R parity at
atol=1e-8 in test_did_had_parity.py (3 DGPs x 5 method combos, bit-exact
via rtol=0) is a strictly stronger correctness anchor than coverage-rate
MC. Paper Section 5.2 itself self-acknowledges NP estimators too noisy
to be informative on the LBD-restricted PNTR panel.
REGISTRY HAD section gains a consolidated Deviations block (5 entries
with framing header distinguishing Notes igerber#1-igerber#2 = implementation choices
from Notes igerber#3-igerber#4 = waived validation-harness work from igerber#5 = Library
extension for staggered-timing fail-closed). Existing scattered Note
entries at L2313 (equal-weighting) and L2398 (sup-t gating) referenced
from the new block.
METHODOLOGY_REVIEW.md HAD row promoted In Progress -> Complete, detail
section rewritten with Verified Components / Test Coverage / Corrections
Made / Deviations / Outstanding Concerns structure mirroring the Bacon /
TripleDifference Complete-row layout.
TODO.md: existing Phase 4 Pierce-Schott row annotated with the 2026-05-20
waiver decision + rationale; new follow-up row for covariates= Theorem 6
NotImplementedError +Theorem 6 pointer (Low priority).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
HanomicsIMF
pushed a commit
to HanomicsIMF/diff-diff
that referenced
this pull request
May 22, 2026
- P3 (Methodology): the promoted HAD materials described the Eq. 17/18 `trends_lin=True` linear-trend-detrended variant as "deferred per Phase 4". This conflated TWO different things: (a) the FEATURE — which is shipped via the `trends_lin: bool = False` keyword-only kwarg on HAD.fit(), joint_pretrends_test, and joint_homogeneity_test (PR igerber#389; R-parity locked against DIDHAD::did_had(trends_lin=TRUE) v2.0.0 in test_did_had_parity.py); and (b) the PIERCE-SCHOTT NUMERICAL REPLICATION against the published p=0.51 anchor on the LBD-restricted panel, which IS waived per REGISTRY Deviations Note igerber#3. Updated 3 surfaces (paper-review L194, METHODOLOGY_REVIEW Eq. 18 Verified-Components row, test_methodology_had.py module docstring + TestHADJointStute class docstring) to distinguish "feature shipped + R-parity locked elsewhere" from "Pierce-Schott numerical replication waived". - P3 (Documentation/Tests): TestHADJointStute promotion narrative overstated H1 coverage as "H0 fail-to-reject and H1 reject on linear vs nonlinear DGPs" for both joint_pretrends_test and joint_homogeneity_test. Reality: H1 rejection is tested only on joint_homogeneity_test via a quadratic post- DGP; joint_pretrends_test gets H0-only coverage in this file (H1 would require a violating-pretrends fixture that re-verifies bootstrap calibration covered by test_had_pretests.py). Narrowed wording in METHODOLOGY_REVIEW Verified-Components row + TestHADJointStute class docstring; CHANGELOG entry unchanged (the H1 reject claim in CHANGELOG explicitly cites the homogeneity side via "H1 reject under nonlinear DGP", which is accurate). All 35 methodology tests pass; lint clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
HanomicsIMF
pushed a commit
to HanomicsIMF/diff-diff
that referenced
this pull request
May 22, 2026
…GELOG H1 scope - R6 fix left METHODOLOGY_REVIEW.md Deviations item igerber#6 stale (only updated the Verified-Components row). Item igerber#6 still said "Eq. 18 linear-trend- detrended joint Stute deferred". Rewritten to match the rest of the HAD tracker: trends_lin=True is SHIPPED + R-parity-locked in test_did_had_parity.py; the methodology-walkthrough file deliberately doesn't duplicate that coverage; the Pierce-Schott published-value numerical replication is what's waived (Deviations Note igerber#3). - R6 narrowed the Verified-Components row + class docstring but missed the CHANGELOG bullet, which still claimed "joint Stute pre-trends + homogeneity H0 fail-to-reject + H1 reject under nonlinear DGP". Narrowed to: "H0 fail-to-reject on both surfaces and H1 reject for joint homogeneity under a nonlinear DGP" — matches the test file's actual scope. All 35 methodology tests pass; lint clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
May 25, 2026
… to_dict() ImputationDiDResults now exposes `vcov_type`, `cluster_name`, `n_clusters` and a new `to_dict()` method (Phase 1b interstitial #3), but the shared "Common Results Pattern for Staggered Estimators" section in llms-full.txt still listed only `summary()`, `print_summary()`, and `to_dataframe()`. Adds a variance-metadata table and threads `to_dict()` into the Methods line so AI-agent guide consumers can discover the surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
May 25, 2026
Phase 1b interstitial #3 for ImputationDiD. Mirrors the CallawaySantAnna (PR #487) + TripleDifference (PR #488) template for IF-based estimators: vcov_type is permanently narrow to {"hc1"} because the per-unit influence function aggregation (Borusyak-Jaravel-Spiess 2024 Theorem 3) has no single design matrix on which hat-matrix leverage or Bell-McCaffrey Satterthwaite DOF can be defined. Source surface: - diff_diff/imputation.py: vcov_type param + @staticmethod _validate_vcov_type + fit()-time revalidation + cluster+replicate-weights NotImplementedError guard + Results cluster_name/n_clusters resolution - diff_diff/imputation_results.py: vcov_type/cluster_name/n_clusters fields + new to_dict() + variance-estimator line in summary() routing through shared _format_vcov_label helper - diff_diff/imputation_bootstrap.py: dual-site n_clusters<2 / n_psu<2 NaN guards via new _build_nan_bootstrap_results helper (closes the BLAS-roundoff zero-SE class predicted to recur on IF-based estimators) Tests: 34 new tests in TestImputationDiDVcovType covering default / cluster / TSL-survey / replicate-survey bit-equality (parametrized over aggregate modes), bootstrap × cluster + bootstrap × survey bit-equality, fit()-time revalidation after set_params bypass, bootstrap n_psu<2 / n_clusters<2 NaN propagation, pretrends bit-equality, and the full introspection + safety-gate surface (8 tests). Docs: REGISTRY.md (IF-based taxonomy + 4 new Notes), CHANGELOG.md, TODO.md (row narrowed, Conley follow-up added), llms-full.txt (vcov_type + pretrends signature drift fix). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
May 25, 2026
… to_dict() ImputationDiDResults now exposes `vcov_type`, `cluster_name`, `n_clusters` and a new `to_dict()` method (Phase 1b interstitial #3), but the shared "Common Results Pattern for Staggered Estimators" section in llms-full.txt still listed only `summary()`, `print_summary()`, and `to_dataframe()`. Adds a variance-metadata table and threads `to_dict()` into the Methods line so AI-agent guide consumers can discover the surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
May 25, 2026
… 3 (Phase 1b interstitial #3) Phase 1b interstitial #3 for ImputationDiD. Mirrors the CallawaySantAnna (PR #487) + TripleDifference (PR #488) template for IF-based estimators: vcov_type is permanently narrow to {"hc1"} because the per-unit influence function aggregation (Borusyak-Jaravel-Spiess 2024 Theorem 3) has no single design matrix on which hat-matrix leverage or Bell-McCaffrey Satterthwaite DOF can be defined. Source surface (diff_diff/): - imputation.py: vcov_type param + @staticmethod _validate_vcov_type + fit()-time revalidation + cluster+replicate-weights NotImplementedError guard + Results metadata resolution (cluster_name=unit by default for the Theorem 3 unit-clustered IF variance; suppressed under ANY survey design — analytical OR replicate — because replicate variance ignores cluster/PSU entirely) - imputation_results.py: vcov_type/cluster_name/n_clusters fields, new to_dict() method, summary() variance line via shared _format_vcov_label (default cluster=None renders "CR1 cluster-robust at <unit>, G=<n>"; bootstrap fits suppress the analytical label and render "Inference method: bootstrap" instead, mirroring DiDResults.summary() gate at results.py:213-226) - imputation_bootstrap.py: dual-site n_clusters<2 / n_psu<2 NaN guards via new _build_nan_bootstrap_results helper (closes the BLAS-roundoff zero-SE class predicted to recur on IF-based estimators) Tests: 42 new tests in TestImputationDiDVcovType covering default / cluster / TSL-survey / replicate-survey + bootstrap × cluster + bootstrap × survey bit-equality (ALL parametrized over aggregate ∈ {None, "event_study", "group"} with per-horizon and per-group SE override branches pinned); fit()-time revalidation after set_params bypass; bootstrap n_psu<2 / n_clusters<2 NaN propagation including coef_var NaN; pretrends=True × vcov_type='hc1' × cluster bit-equality; introspection (default attr, get_params, Results carries, to_dict, summary label default+cluster+bootstrap-suppressed, cluster_name suppression under both analytical AND replicate survey, fit-clone idempotence, convenience function); input rejection on classical/hc2/hc2_bm/conley/ unknown with distinct methodology-keyword pins; cluster+replicate rejection. Full pytest tests/test_imputation.py: 128 passed. Docs: - REGISTRY.md: IF-based taxonomy adds ImputationDiD to "Enforced today" tier; ImputationDiD section gains 4 new Notes (vcov_type contract, cluster+replicate fail-closed, bootstrap n<2 NaN, default unit-cluster CR1 rendering) - CHANGELOG.md: [Unreleased] entry - TODO.md: Phase 1b row narrowed to TwoStageDiD + EfficientDiD; ImputationDiD Conley follow-up row added - guides/llms-full.txt: vcov_type + pretrends signature drift fix + shared staggered-results section advertises new variance metadata fields and to_dict() Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
May 25, 2026
…-phase1b
ImputationDiD: thread vcov_type as narrow {hc1} contract per BJS Theorem 3 (Phase 1b interstitial #3)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.